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Abstract. In this paper we study fc-noncrossing, canonical RNA pseudoknot structures with 
minimum arc-length > 4. Let (n) denote the number of these structures. We derive exact 
enumeration results by computing the generating function (z) = ~^2 n T <l ^ cr (n)z rl and derive 

[41 f h 1 ^ 2 k — 1 r^l 

the asymptotic formulas Tj_ 3 (n) ~ c^n 1 ' 2 (-yj_ 3 ) — ™ for k = 3, . . . , 9. In particular we 
[41 

have for k = 3, T 3 3 (»i) ~ C3 n~ J 2.0348* 1 . Our results prove that the set of biophysically relevant 
RNA pseudoknot structures is surprisingly small and suggest a new structure class as target for 
prediction algorithms. 



1. Introduction 



RNA pseudoknot structures have drawn a lot of attention over the last decade [T]. From micro- 
RNA binding to ribosomal framcshifts [20 , we currently discover novel RNA functionalities at 
truly amazing rates. Our conceptional understanding of RNA pseudoknot structures has not kept 
up with this pace. Only recently the generating functions of /c-noncrossing RNA structures of 
arc- length > 2 [TT], arc- length > 4 [S] and canonical fc-noncrossing RNA structures of arc-length 
> 2 [13] have been derived. While these combinatorial results open new perspectives for the design 
of new folding algorithms, it has to be noted that realistic pseudoknot structures are subject to 
a minimum arc- length > 4 and stack-length > 3. Therefore the above structure classes are not 
"best possible" . The lack of a transparent target class of RNA pseudoknot structures represents 
a problem for ab initio prediction algorithms. There are four algorithms, capable of the energy 
based prediction of certain pseudoknots in polynomial time: Rivas et al. (dynamic programming, 
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gap-matrices, 0(n 6 ) time and 0(rt 4 ) space) [3TJ, Uemura et al. (0(n 5 ) time and 0(n A ) space, tree- 
adjoining grammars) [25j . Akutsu [3] and Lyngso [18j . All of them follow the dynamic programming 
paradigm and none produces an easily specifiable class of pscudoknots as output. 

In this paper we characterize a class of pscudoknot RNA structures in which bonds have a minimum 
length of four and stacks contains at least three base pairs. Our results show that this structure 
class is ideally suited as a priori-output for prediction algorithms. Tab[T] indicates that this class 
remains suitable even for more complex pseudoknots (specified in terms of larger sets of mutually 
crossing bonds). In fact, one can search RNA 3-noncrossing pscudoknot structure with arc-length 
> 4 and stack-length a > 3 for a sequence of length 100 w.r.t. a variety of objective functions (in 
particular loop-based minimum free energy models) on a 4-core PC in a few minutes |10| . 

In order to put our results into context, wc turn the clock back by almost three decades. 1978 
M. Waterman et al. [27] [28 . 29, 30] began deriving the concepts for enumeration and prediction 
of RNA secondary structures. The latter represent arguably the prototype of prediction-targets of 
RNA structures. RNA secondary structures are coarse grained structures which can be represented 
as outer-planar graphs, diagrams, Motzkin-paths or words over "." "(" and ")". Their decisive 
feature is that they have no two crossing bonds, see Fig[TJ Let T^n) denote the number of 
secondary structures with arc- length > A over [n] = {l,...,n}. The key to RNA secondary 
structures is the following recursion for T [ z ] (n): 

n-(A+l) 

(1.1) TM(n)=lf(n-l) + ]T if 1 (n - 2 - j)T™ (j), 

3=0 

where {n) = 1 for < n < A. The latter follows from considering the concatenation of 
Motzkin-paths with minimum peak length A — 1. Eq. (jl.ip implies for the generating function 
li A '( z ) = zCn>o ~^2^ ( n ) zTl the functional equation 

(1.2) z 2 T l 2 X] {zf - (1 - z + z 2 + ■ ■ ■ + z x )T [X] {z) + 1 = 
from which eventually 

w -l + 2z- 2z 2 + z x+1 + Vl - 4z + Az 2 - 2z A +! + 4z A + 2 - 4z x + 3 + z 2A + 2 

2 {Z) ~ 2(^3 _ Z 2) 

follows. Therefore, minimum arc-length restrictions do not impose particular difficulties for RNA 
secondary structures. In fact minimum stack size conditions can also be dealt with straightfor- 
wardly. We furthermore note that cq. (jl.ip is a constructive recursion, i.e. it allows to inductively 
build secondary structures over [n] from those over [i] , for all i < n. 
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FIGURE 1. RNA secondary structures. 

In order to analyze RNA structure with crossing bonds, we recall the notion of fc-noncrossing 
diagrams [11J . A fc-noncrossing diagram is a labeled graph over the vertex set [n] with vertex 
degrees < 1, represented by drawing its vertices 1, ...,n in a horizontal line and its arcs 
where i < j, in the upper half-plane, containing at most fc — 1 mutually crossing arcs. The 
vertices and arcs correspond to nucleotides and Watson-Crick (A-U, G-C) and (U-G) base pairs, 
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FIGURE 2. fc-noncrossing diagrams: we display a 4-noncrossing, arc-length A > 4 and 
a > 1 (upper) and 3-noncrossing, A > 4 and a > 2 (lower) diagram. 

respectively. Diagrams have the following three key parameters: the maximum number of mutually 
crossing arcs, k — 1, the minimum arc-length, A and minimum stack-length, a ((k, A, a) -diagrams). 
The length of an arc is j — i and a stack of length a is a sequence of "parallel" arcs of the 
form (i + l,j — 1), ...,(£+ (er — 1), j — (<r — 1))), see Fig{2j We call an arc of length A a 

A-arc. Let T^^{n) denote the set of fc-noncrossing diagrams with minimum arc- and stack-length 
A and a and let T^(n) denote their number. 

In the following, we shall identify pseudoknot RNA structures with fc-noncrossing diagrams and re- 
fer to them as (fc, A, ^-structures. Pseudoknot RNA structures occur in functional RNA (RNAseP) 
[T7] . ribosomal RNA [TH] and plant viral RNAs and vitro RNA evolution experiments have pro- 
duced families of RNA structures with pseudoknot motifs [23]. In Fig [3] we give several repre- 
sentations of the UTR-pseudoknot of the mouse hepatitis virus. Due to the crossings of arcs 
pscudoknots differs considerably from secondary structures: pseudoknot RNA structures are in- 
herently non-inductive and no analogue of eq. fll.l[) exists. One key for the generating function of 
fc-noncrossing RNA structures t[ A '(z) was the bijection of Chen et al. jjj obtained in the context 
of fc-noncrossing partitions. This bijection has been generalized to fc-noncrossing tangled diagrams 
[5], a class of contact-structures tailored for expressing RNA tertiary interactions. Via the bijec- 
tion fc-noncrossing RNA structures can be identified with certain walks in Z fc_1 that remain in the 
region 



{(x 1 ,...,x k -i) eZ k 1 | xi > x 2 > ...x k -i > 0} 
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FIGURE 3. UTR-pseudoknot structure of the mouse hepatitis virus. 

starting and ending at 0, the boundaries of which are called walls. The enumeration of these 
walks is obtained employing the reflection principle. This method is due to Andre in 1887 [2] 
and has subsequently been generalized by Gessel and Zcilbcrger [7]. In the reflection principle 
"bad" -i.e. reflected- walks cancel themselves. In other words one enumerates all walks and due to 
cancellation only the ones survive that never touch the walls. Despite its beauty this method does 
not trigger any algorithmic intuition and is nonconstructive. Moreover, fc-noncrossing RNA struc- 
tures cannot directly be enumerated via the reflection principle: it does not preserve a minimum 
arc-length. In [TT] it is shown how to eliminate specific classes of arcs after reflection. One non- 
trivial implication of this theory is that all generating functions for fc-noncrossing RNA structures 
are D-finite, i.e. there exists a nonconstructive recurrence relation of finite length with polynomial 
coefficients for T^(n). Note however, that although we can prove the existence of this recurrence 
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it is at present not known for any k > 2. In Fig0]we illustrate the key steps for the enumeration 
of fc-noncrossing RNA structures [IT] . 




□ 



1 2 3 4 5 6 



□ 




- + n 1 + n 2 + n 1 -n 2 -n 1 -D 1 



12 3 



FIGURE 4. Exact enumeration of fc-noncrossing RNA structures. 



Once T k a .(z) is known we employ singularity analysis and study its dominant singularities, using 
Hankel contours. This Ansatz has been pioneered by P. Flajolet and A.M. Odlyzko [BJ. Its basic 
idea is the construction of an "singular-analogue" of the Taylor-expansion. It can be shown that, 
under certain conditions, there exists an approximation, which is locally of the same order as the 
original function. The particular, local approximation allows then to derive the asymptotic form 
of the coefficients. In our situation all conditions for singularity analysis are met, since all our 
generating functions are D-finite |22l I31| and D- finite functions have an analytic continuation into 
any simply-connected domain containing zero. 

We will compute tIl 1 (z) and show that T^^) has an unique dominant singularity, whose type 
depends solely on the crossing number [T3] . Via singularity analysis will produce an array of 
exponential growth rates indexed by k and er, summarized in Tab. [I] The ideas of this paper build 
on those of [111 I13| . In |13j core-structures are introduced via which er-canonical fc-noncrossing 
structures can be enumerated, (k, 4, a) -structures where er > 3 can however not be enumerated via 
core-structures, see FigfS] This is a result from the fact that the core-map, obtained by identifying 
stacks by single arcs does not preserve arc-length. Therefore wc have to introduce a new set of 
A:-noncrossing diagrams, denoted by T£(n,h). This class is designed for inducing a new type of 
cores, Cl(n',h') (see Theorem [3]). Then wc proceed using ideas similar to those in [13] and prove 
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fc 


3 


4 


5 


6 


7 


8 


9 


a = 3 


2.0348 


2.2644 


2.4432 


2.5932 


2.7243 


2.8414 


2.9480 


a = 4 


1.7898 


1.9370 


2.0488 


2.1407 


2.2198 


2.2896 


2.3523 


a = 5 


1.6465 


1.7532 


1.8330 


1.8979 


1.9532 


2.0016 


2.0449 


(T = 6 


1.5515 


1.6345 


1.6960 


1.7457 


1.7877 


1.8243 


1.8569 


CT = 7 


1.4834 


1.5510 


1.6008 


1.6408 


1.6745 


1.7038 


1.7297 


CT = 8 


1.4319 


1.4888 


1.5305 


1.5639 


1.5919 


1.6162 


1.6376 


(7 = 9 


1.3915 


1.4405 


1.4763 


1.5049 


1.5288 


1.5494 


1.5677 



TABLE 1. Exponential growth rates of (fc, 4, (restructures where a > 3. 



J I 1+4 J 

<- Length=2 ► 

FIGURE 5. Core-structures will in general have 2-arcs: the structure S £ T3 4 ](12) (lhs) 
is mapped into its core c(<5) (rhs). Clearly S has arc- length > 4 and as a consequence of 
the collapse of the stack ((/ + 1, J + 2), (7 + 2, J + 1), (7 + 3, J)) (the red arcs are being 
removed) into the arc (7 + 3, J), c(S) contains the arc (7, 7 + 4), which is, after relabeling, 
a 2-arc. 

our exact enumeration result, Theorem [3l As for the singularity analysis the main contribution is 
Claim 1 of Theorem 01 a new functional equation for Tv a (z). 

2. Preliminaries 

In this Section we provide some background on the generating functions of fc-noncrossing matchings 
[H [15] and fc-noncrossing RNA structures [HI [12] . We denote the set (number) of fc-noncrossing 
RNA structures with arc-length > A and stack-size > a by T^\{n) (TL^'(n)). By abuse of notation 
we omit the indices A and a in T^in) (T^ a (n)) for A = 2 and a = 1. A fc-noncrossing core- 
structure is a fc-noncrossing RNA structures in which there exists no two arcs of the form (i + 
1, j — 1). The set (number) of fc-noncrossing core-structures and fc-noncrossing core-structures with 




-Length=4- 
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exactly h arcs is denoted by Cfc(n)(Cfc(n)) and Ck{n,h) (Ck{n,h)), respectively. Furthermore we 
denote by fk {n, £) the number of fc-noncrossing diagrams with arbitrary arc- length and £ isolated 
vertices over n vertices and set M/ C (n) = X)"=o fk{ n A)- That is, Mfe(n) is the number of all 
fc-noncrossing partial matchings. In FigJSjwe display the various types of diagrams involved. 




1 2 3 4 5 6 7 
(A) 



1 2 3 4 5 6 7 
(C) 



9 10 




9 10 




1 23456789 10 
(B) 





1 23456789 10 
(□) 



FIGURE 6. Basic diagram types: (A) 4-noncrossing matching (no isolated points), (B) 3- 
noncrossing partial matching (isolated points 4 and 9), (C) 4-noncrossing RNA structure 
with arc-length > 4 and stack length > 1, (D) RNA structure with arc-length > 5 and 
stack-length > 3. 



2.1. fc-noncrossing partial matchings and RNA structures. The following identities are due 
to Grabiner and Magyar [8] 

x n 

(2.1) J2h(n,0)-— = det[/ < _ i (2a:)-/ <+J -(2 a; )]|*ji 1 

T1>0 

{ n \ n 

e=o ) 

where I r (2x) = X)j>o j\(r+j)\ denotes the hyperbolic Bessel function of the first kind of order r. 
Eq. (|2.ip and (|2.2| allow only "in principle" for explicit computation of the numbers fk(n,l) and 
in view of fk(n,l) = (" e )fk{n — £,0) everything can be reduced to (perfect) matchings, where we 
have the following situation: there exists an asymptotic approximation of the determinant of the 
hyperbolic Bessel function for general fc due to [15] and employing the subtraction of singularities- 
principle [19] one can prove p~5] 



(2.3) VfceN; / fe (2n,0) ^ c fc n-« fe - 1 ) 2 +(^ 1 )/ 2 )(2(fc-l)) 2 ", c k > , 
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k 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Ik' 


2.6180 


4.7913 


6.8541 


8.8875 


10.9083 


12.9226 


14.9330 


16.9410 


18.9472 



TABLE 2. The exponential growth rates of (k, 2, l)-structures. 



k 


4 


5 


6 


7 


8 


9 


Tk 


6.5290 


8.6483 


10.7176 


12.7635 


14.7963 


16.8210 



TABLE 3. The exponential growth rates of {k, 4, l)-structures. 



where pk — 2 (k-i) ^ s ^ e dominant real singularity of X)n>o fk(2 n , 0)z 2 ™. For (k, 2, l)-structures 
we have [TQ 112 



L«/2J , , 
' n — o 



(2.4) T fc (n) = £(-l) b (\ M fe (n-2fe) 

(2-5) T fc (n) ~ CfcM-^-^+^-^^CT*)-", c k > , 

where 7 fc is the unique, minimal solution of z iJ z+1 = Pk, see Tab. [2j For (k, 4, l)-structures we 
have according to [5] the following exact enumeration result 

(2.6) T^(n) = ^ (-l) b \(n,b)M k (n-2b), 4<fc<9, 

6<LfJ 

where A(n, b) denotes the number of way of selecting b arcs of length < 3 over n vertices and 

(2.7) TLV) ~c*n- ((fc - 1)i,+Cfc - 1)/a) (7l 41 )"" 

where 7^ is the unique positive, real solution of fz^~r~l) = Pfc where ri(z) satisfies 

0(2) = \/l + 4z - 4z 2 - 6z 3 + 4z 4 + z 6 

-2z 2 + z 3 - 1 + u(z) 
ri(z) = 2(1 -2z-z 2 + z 4 ) ' 

In Tab. [3] we present the exponential growth rates for T^ 4 '(n) for k = 4,..., 9. For (k,2,a)- 
structures we have according to [12] 

(2-8) TkA*) = 5 1 f 

n>0 x 

where u = (3 .^ )g _ a .a +1 and 

(2.9) J k . a {n) - c k n -((*-D 2 +(*-i)/2) ( 7 -i)" 
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k 


2 


3 


4 


5 


6 


7 


8 


9 


10 


(T 


= 2 


1.9680 


2.5881 


3.0382 


3.4138 


3.7438 


4.0420 


4.3162 


4.5715 


4.8115 


(J 


= 3 


1.7160 


2.0477 


2.2704 


2.4466 


2.5955 


2.7259 


2.8427 


2.9490 


3.0469 


(J 


= 4 


1.5782 


1.7984 


1.9410 


2.0511 


2.1423 


2.2209 


2.2904 


2.3529 


2.4100 


a 


= 5 


1.4899 


1.6528 


1.7561 


1.8347 


1.8991 


1.9540 


2.0022 


2.0454 


2.0845 



TABLE 4. The exponential growth rates {k, 2, ^-structures [13] . 



where jk.a is a positive real dominant singularity of X)n>o ~^k,<?( n ) x " an d the minimal positive real 
solution of the equation 



/ 01n s V (x 2 )°-x 2 + l X 

( } ( ^ W ~ x i i = Pk - 

\(x 2 )"-x 2 + l ) X 

In Tabled] we present the exponential growth rates of (k, 2, (restructures. 



2.2. Singularity analysis. Let us next recall some basic fact about analytic functions. Pfring- 
sheim's Theorem [53] guarantees that each power series with positive coefficients has a positive 
real dominant singularity. This singularity plays a key role for the asymptotics of the coefficients. 
In the proof of Theorem [4] it will be important to deduce relations between the coefficients from 
functional equations of generating functions. The class of theorems that deal with such deductions 
are called transfer-theorems [BJ. We consider a specific domain in which the functions in question 
are analytic and which is "slightly" bigger than their respective radius of convergence. It is tailored 
for extracting the coefficients via Cauchy's integral formula. Details on the method can be found 
in [BJ . In case of Z?-finite functions we have analytic continuation in any simply connected domain 
containing zero [26] and all prerequisites of singularity analysis are met. To be precise, given two 
numbers cf>, R, where R > 1 and < <fi < | and p £ R, the open domain A p (0, R) is defined as 

(2.11) A p (0, R) = {z\ \z\ <R,z^p, |Arg(z - p)\ > 0} 

A domain is a A p -domain if it is of the form A p (</>, R) for some R and <p. A function is A p -analytic 
if it is analytic in some A p -domain. We use the notation 

(2.12) (f( z ) = O (g(z)) as z — > p) <^=> [f{z)/g{z) is bounded as z — > p) 

and if we write f(z) = 0(g(z)) it is implicitly assumed that z tends to a (unique) singularity. 
[z n ] f{z) denotes the coefficient of z n in the power series expansion of f(z) around 0. 
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k 


qo,k( z ) 


M k 


3 


(l/4-4z 2 )z 2 


{±1/4} 


4 


(144z 4 -40z 2 ±l)z 6 


{±1/2, ±1/6} 


5 


(-80 z 2 + 1024z 4 + l)z 8 


{±1/4, ±1/8} 


6 


(-4144 z 4 + 140 z 2 + 14400 z 6 + 1) z w 


{±1/2, ±1/6, ±1/10,} 


7 


(-1 - 12544 z 4 + 224 z 2 + 147456 z 6 ) z 12 


{±1/4, ±1/8, ±1/12} 


8 


(1 - 336z 2 + 31584z 4 + 2822400z 8 - 826624z 6 )z 14 


{±1/2, ±1/6, ±1/10, ±1/14} 


9 


-(-480z 2 + 1 + 69888z 4 + 37748736z 8 - 3358720z 6 )z 16 


{±1/4, ±1/8, ±1/12, ±1/16} 



TABLE 5. The polynomials qo,k(z) and their nonzero roots. 



Theorem 1. [5] Let f(z),g(z) be D-finite, A p - analytic functions with unique dominant singularity 
p and suppose 

(2.13) f(z) = 0(g(z)) for z^p. 
Then we have 

(2.14) [z"]/(z) = K (l-O(V) , 
where K is some constant. 

Let Ffc(z) = J2 n fk(2 n , 0)z 2n , the ordinary generating function of fc-noncrossing matchings. It 
follows from eq. ()2.1j) that the power series Ffc(z) is £>- finite, i.e. there exists some e £ N such that 

(2.15) q , fc (z) ^Fk(z) + gi, fc (z)^ TT F fe (z) + • • • + g e , fc (z)F fc (z) = , 

where qj,k(z) are polynomials. The key point is that any dominant singularity of F^(z) is contained 
in the set of roots of qo,k(z) [22], which we denote by Mk- The polynomials qo : k(z) and their sets 
of roots for fc = 3, ... ,9 are given in Table [S] Accordingly, Ffe(z) has singularities ±pfe, where 
Pk = (2(k-l))-\ 



As a consequence of Theorem [TJ eq. (|2.3p and the so called supercritical case of singularity analysis 
[5], VI. 9., p. 400, we give the following result [Tl] tailored for our functional equations. 
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Theorem 2. Suppose "& a (z) is algebraic over K(z), regular for \z\ < S and satisfies $ CT (0) = 0. 
Suppose further is the unique solution with minimal modulus < S of the two equations ^^{x) = 
Pk and d a {x) — —p k . Then jk.a *s the unique dominant singularity of Fk($ a (z)) and 

(2.16) [z n ]F k (Mz))~c k n-« k -^-V/V ( 7 "i)" . 



3. Exact Enumeration 



In this section we present the exact enumeration of (k, 4, ct) -structures, where a > 3. The struc- 
ture of our formula is analogous to the Mobius inversion formula proved in [13j : T fe a (n, h) = 
Ylb=o--i { b+ ^ 2 ~}i' y Jbl'i bS> ~ 1 )^k(n — 2b, h — b), which relates the number of all structures and the num- 
ber of core-structures. As we pointed out in the introduction the latter cannot be used in order to 
enumerate fc-noncrossing structures with arc-length > 4, see FigEJ We consider the arc-sets 

02 = + 2) I i + 1 isolated} and /3 3 = {(£, i + 3) | i + 1, i + 2 isolated} 

and set (3 = $1 U fa. Furthermore 

(3.1) C k (n,h) = {5 I S £ Ck{n,h); S contains no 1-arc and no /3-arc} 

(3.2) T k (n,h) = {5 | S £ Tk(n, h); 5 contains no 1-arc and no /3-arc} . 

Theorem 3. Suppose we have k, h, a £ N, k > 2, h < n/2 and a > 3. Then the number of 
(k, 4, a) -structures having exactly h arcs is given by 

(3.3) TW (n, h) = h f( b+ (2 "!f " b) - CUn - 2b, h - b) 

b=a-l ^ ' 

where C k (n,h) satisfies Ct(n, 0) = 1 and 

h-l /*, _ i\ 

(3.4) C* k (n,h) = Y^(-l) h ~ b ~ 1 [ ]T* k (n-2h + 2b + 2,b+l) for h>l . 

b=0 ^ ' 

Furthermore, T k (n,h) satisfies 

(3.5) T* k (n,h)= (-^ jl+j2+h Hn,ji,32j3)fk(n-2j 1 -3j 2 -4j 3 ,n-2h-j 2 -2j 3 ) 

0<ji+j2+j 3 <h 

where 

n-ji- 2j 2 - 3j 3 



A(n,ji,j2,j 3 ) 



h , 32,33, n - 2ji - 3j 2 - 4j 
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n 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


T&(») 


1 


2 


4 


8 


15 


28 


52 


96 


176 


316 


557 


965 


1660 


2860 


4974 


8754 


15562 


T&(n) 


1 


1 


1 


2 


4 


8 


14 


23 


36 


56 


88 


141 


231 


382 


633 


1038 


1679 



TABLE 6. Exact enumeration: Tg 3 (n) and T 3i (n) for n < 24, respectively. 



In Tab[6]we display the first numbers of (k, 4, 3)-structures and (k, 4, 4)-structures, respectively. 

Proof. We first show that there exists a mapping from (fc, 4, ^-structures with ft arcs over [n] into 
\J„- 1 < b < h - 1 C* k (n-2b,h-b): 

(3.6) c:TW(n,h)^\J v _^ b ^CIl(n-2b,h-b), 6 - c(o) 

which is obtained in two steps: first induce c(<$) by mapping arcs and isolated vertices as follows: 

(3.7) Wi>a — 1; ((i — £,j + £),..., i— * («,j) and j i— * j if j is an isolated vertex 

and second relabel the resulting diagram from left to right in increasing order, see FigtZl 
Claim 1. c: T^(n, ft,) — > Uer-i<&</i-iCk ( n — 2&, ft — 6) is well-defined and surjective. 




1234567891011121314 3478712 1 23456 



FIGURE 7. The mapping c: T [ ^\(n,h) — > \J a -i<b<h-i C k( n - 2b,h - b) is obtained 
in two steps: first contraction of the stacks while keeping isolated points and secondly 
relabeling of the resulting diagram. 

By construction, c does not change the crossing number. Since T^\{n) contains only arcs of length 

> 4 we derive c(xf jfn)) C C£(n - 2b, h-b). Therefore c is well-defined. It remains to show that c 

is surjective. For this purpose let S G C£ (n — 26, ft — b) and set a = b — [a — l)(ft — b). We proceed 

constructing a fc-noncrossing structure S in three steps: 

Step 1. replace each label z by r^, where < r s if and only if i < s. 

5iep 2. replace the leftmost arc (r p , r 9 ) by the sequence of arcs 



(3.8) 



(( T P ~ ([°" - !] + a ). r 9 + - !] + a ))> ■ ■ ■ , ( T P: r <z)) 
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replace any other arc (r p ,r q ) by the sequence 

(3.9) ((r p -[a- 1], r q + [a - 1]), . . . , (t p , t,)) 

and each isolated vertex r s by t s . 

3. Set for x, y G Z, Tt + y < t c -\-x if and only if (6 < c) or (6 = c and y < x). By construction, 
< is a linear order over 

n - 2b + 2(h-b) (a - 1) + 2a = n - 2b + 2(h - b) (a - l) + 2(b- (a - l)(h-b)) = n 

elements, which we then label from 1 to n (left to right) in increasing order. It is straightforward 
to verify that c(<5) = 5 holds. It remains to show that 5 G (n). Suppose a contrario 5 contains 
an arc + 2). Since a > 3 we can then conclude that i + 1 is necessarily isolated. The arc 
+ 2) is mapped by c into + 2) with isolated point j + 1, which is impossible by definition 
of C^(n', h'). It follows similarly that an arc of the form (i, i + 3) cannot be contained in 8 and 
Claim 1 follows. 

Labeling the h arcs of S G Tj[ 4 ^ (n, h) from left to right and keeping track of multiplicities gives rise 

to the map 

(3.10) 

{h-b 
i a j)l<j<h-b I ^2 a 3 = b > a ]> <J - 1 



Av^I>^)-U_ 1<t 



b+{2-a)(h-b)-l 

h-b -I 



given by f k a (5) = (c(S), (a,j)i<j<h-b)- We can conclude that f k a is well-defined and a bijection. 
We proceed computing the multiplicities of the resulting core-structures [13] : 

h-b 

(3.11) \{( a j)i<j<b I ^2 a 3 ■ = b; aj > a - 1}| 

3=1 

Eq. f3~TT|) and eq. (|3~TU)l imply 

(3.12) TZ(n,h)= E ( b+{2 :^ h) - l )ci { n-2b, h -b), 

b=a-l ^ ' 

whence eq. (|3.3[) . Next we consider the map 

(3.13) c*:n(n,h)^\J o ^ h _CUn-2b,h-b), S^c*(6) 

Indeed, c* is well defined, since any diagram in T k (n,h) can be mapped into a core structure 
without 1- and /3-arcs, i.e. into an element of Cl(n', h'). That gives rise to 

(3.14) T%(n, h) = J2( h ~ ^ Q(n -2b,h- b) 

6=0 ^ ' 
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and via Mobius-inversion formula we obtain eq. (|3.4|) . It is straightforward to show there are 

K n >hih>h) = Ci^Xfn-sIi-lla-^) ways t0 select Jl 1_arcs > j2 #2-arcs and j 3 /3 3 -arcs over [n\. 
Since removing ji 1-arcs, j% /32-arcs and j'3 /33-arcs removes 2ji + 3j2 + 4j3 vertices, the number of 
configurations of at least ji 1-arcs, 32 /?2-arcs and J3 /^-arcs is given by A(n, j±, 32, j^)fk{n — 2j\ — 
3j2 — 4j3, n — 2h — ]2 — 2^3). Via inclusion-exclusion principle, we arrive at 



T%(n, h)= E (-l) jl+i2+J ' 3 A(n,ii,i 2 , js)f k (n - 2 n - 3j 2 - Aj s ,n -2h- j 2 - 2j a ) , 

0<h+ja+j 3 <h 

whence Theorem [3] □ 



The following functional identity, relating the bivariate generating functions of (n, h) and 
C£(n, h), is instrumental for proving our main result in the next section, Theorem 3] 

Lemma 1. [13j Let k 7 o~ € N, k>2 and let u,x be indeterminants. Suppose we have 

(3.15) Vfc>l, k k> „(n,h)= J2 ( b+{2 h a) l ) h S 1 b) ~ 1 )Bk(n-2b,h-b) andA k Jn,0) = l. 

b=a-l ^ 1 ' 

Then we have the functional relation 

(3.16) £ £ A fcj >,%V = E E B fc (n,/,)( " i ( ^27 1 X 



n>0 0</i<§ n>0 0<h<§ 



According to Lemma [T]eq. (|3. 14|) and cq. (|3.3p wc obtain the two functional identities 
( 3 - 1? ) E E Tl(n,h)u h x n = J2 E C U n > h )(iZ^ 



n>0 0<K 

(3-i8) E T &(^' l = E E c ^v(t^~) * n fora - 3 

ri>0 n>0 0<ft<" 



h 

x r ' 



2\a-l \ h 



4. Asymptotic Enumeration 



In this section we study the asymptotics of (k, 4, ct) -structures, where a > 3. We are particularly 
interested in deriving simple formulas that can be used for assessing the complexity of prediction 
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algorithms for fc-noncrossing RNA structures. In order to state Theorem [¥] below we introduce 



(4.1) 

(4.2) 
(4.3) 



Wq{x) 

v(x) 

V (x) 



1 - X 2 + x 2cr 

1 — x + w{x)x 2 + w(x)x 3 + w(x)x A 
1 — x + wq(x)x 2 + wo(x)x 3 + wq{x)x a . 



Theorem 4. Let k, a € N, fc, a > 3, x be an indeterminate and pu the dominant, positive real 
singularity of X) n >o fk{2 n > 0)z 2n . Then T^j^x), the generating function of (k, 4, a) -structures is 
given by 



T^,^X>( 2 M>(^) 



Furthermore 



(4.5) TW H^c.n-^ 1 ) 2 -^ ^-L^ , /or fc = 3,4,...,9 

holds, where is the positive real dominant singularity of (x) and the minimal positive real 
solution of the equation 7 °( L ) ~ = P k ant ^ fk(^n,0) ~ rt - ^ -1 ) 5~ f^-J ("eg. ifl?.3)) ). 



Proof. In the following we will use the notation wo instead of wq(x), eq. I|4.ip . The first step 
derives a functional equation relating the bivariate generating functions of T£ (n, h) and fk(2h', 0). 
For this purpose we use eq. (|3.5[) . 
Claim 1. 



(4.6) 



n>0 h<fl. 



w(x) 



2u 



n>0 



w(a;) 
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Set tp m (w) = Yl,h<^ {2h)fk{2h,0)w h . In order to prove Claim 1 we compute 

n>0 h<% 

= E E E (-l)* +ia+A A(n, ji, ja, js)A(n - 2ji - 3j 2 - 4j 3) n - 2ft- j 2 - 2j 3 )t*V 

= E E (-ir^^A^ii,^,^)^ 

n>0 ji+j 2 +j3<f 

x E /fe("- 2 ii-3j2-4j3,n-2/i-j 2 -2i3)^ 

^>ii+i2+j3 

= E E (-l) , ' 1+i " +3 ' s A(n,Ji,i2,i3)x B 
n>0 ji+j2+j3<f 

'n - 2ji - 3j 2 - 4j 3 N 
n-2h- j 2 - 2j 3 



/ fc (2(/i-i 1 -i 2 -i 3 ),0K 



E E 



x E 

h>jl+h+33 

.^h+h+h A(nj j lji2) j 3 )^i+^+i3^ n _ 2ii _ 3 . 2 _ 4 - 3 



We interchange the summation over ji + j 2 + j 3 and n and arrive at 



E E 

Ji+i2+js>0 n>2ji+3j 2 +4j3 



n-ji- 2j 2 - 3j 3 
h , h ,33,n-2j 1 - 3 j 2 - 4j 3 



{n- ji - 2j 2 - 3j 3 )! 
3i l -hW- ^, ■ "fr'. , , ■ (n- 2ji - 3j 2 - 4j 3 )! 



E 



E 



tp n -2h-3j 2 -4,j 3 (w)x n . 



Setting m = n — 2j\ — 3j 2 — 4j 3 this becomes 



Jl+J2+J3>0 



m>0 



E 

m>0 



\ - (m + j X + J2+ h\, 2\j U 3v/ 2 / i\j 3 

> (— wx y (— wx y 2 — wx y 3 

1 m,jx,32,33 J K 



3l+32+j3>0 



ip m (w)x r 



m>0 



1 



+ TO 2 + TO 3 + WX 4 



m+1 



1 



1 + TO + + WX 



4 E 7 



m>0 



+ !OI 2 + WX 3 + TO 4 
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Next we compute 



, poo ( T nY a 

L — ' /n ' — ' ml 

■n>0 JU m>0 



Jo hh w ^- 2h ^ 

2_^Mlh,U) 2^ ( m -2h)\ X 

h>0 y ' m>2h y ' 

EA(2n,0)<^ J f OO e -<-»>V" ic 



n>0 



2^ Jfc(2n,0j- 



(2n)! (1 - y) 2n+1 



n>0 

In 



V * — ' \ L — V 



Therefore the bivariate generating function can be written as 



2 71 

W X " 



whence Claim 1. In view of eq. (|3. 1 7[) and Claiml we arrive at 

(\ h -1 / / — \ 2n 



10 X 



v(x) 



By definition of = wq(x) have 
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According to cq. (|3.18[) . cq. (|4.8p and eq. (|4.7p this allows us to derive 



T L 4] » = E E c i(»>/o 

n>0 0<h<- 



1-X 2 



n>0 Q<h<% 



E c i( n > h ) 



2/i 



1 — WqX 2 



h 

X % 



, , — N 2n 

whence flO). Let V fe (x) = £„> /*(2", °) (^f ) • 
Claim 2. The unique, minimal, positive, real solution of 



(4.9) #„(x) = = p k , for fc = 3,4,...,9 
denoted by 7^ is the unique dominant singularity of (x) . 

Clearly, a dominant singularity of V( ^ x ^ k( x ) ' 1S either a singularity of Vfe(x) or ^j^y- Suppose 
there exists some singularity ( e C which is a pole of VQ ) X \ ■ By construction ( 7^ and £ is 
necessarily a non-finite singularity of Vfc(x). If |£| < 7fe 4 L' then we arrive at the contradiction 

|v fc (C)|>|v fe ( 7 W CT )|>v fe (|c|) 

since Vfe(C) is not finite and Vfc^j. 4 ^) = /C n >o / fc ( 2n ; 0)Pfc™ < 00 • Therefore all dominant singu- 
larities of (x) are singularities of V& (x) . According to Pringsheim's Theorem [53] , (x) has 
a dominant positive real singularity which by construction equals 7 j c 4 j T being the minimal positive 
real solution of eq. (|4.9p . To prove this, we use that for 3 < k < 9, the generating function Ffc(x) 
has only the two dominant singularities ±/0fc, see Section [5J Tab.[SJ Furthermore we verify that for 
3 < k < 9, 7^, has strictly smaller modulus than all solutions of "d a (z) = —pk, whence Claim 2. 
Accordingly, Theorem [2] applies and we have 

(4.10) T^ CT (n) ~ Cfcn - ^ 1 ' 2- ^ ( J for some constant c k 

\Tk,U 

completing the proof of Theorem [4] □ 
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